Menu Menu

Generative Modeling Reveals the Connection between Cellular Morphology and Gene Expression

COSMIC (Cross-modal generation between Single-cell RNA-seq and MICroscopy images) is a bidirectional generative framework that quantitatively links single-cell nuclear morphology and gene expression. It captures the bidirectional flow of information between cellular form and gene expression, opening new avenues for mechanistic discovery and predictive modeling in both basic and translational cell biology.

The understanding of how transcriptional programs give rise to cellular morphology, and how morphological features reflect and influence cell identity and function remains limited. This is due in part to the lack of large-scale datasets pairing the two modalities, as well as the absence of computational frameworks capable of modeling their cross-modal structure.

We introduce COSMIC, a bidirectional generative framework that enables quantitative decomposition of transcriptional variance reflected in morphology and morphological variance explained by gene expression. COSMIC accurately modeled cell type identity, as well as continuous dynamics such as cell-cycle progression, establishing a quantitative link between morphological phenotypes and underlying gene expression.

Publication

Generative Modeling Reveals the Connection between Cellular Morphology and Gene Expression
Shuo Wen, Ramon Vinas Torne, Johannes Bues, Camille Lucie Lambert, Nadia Grenningloh, Timothee Ferrari, Elisa Bugani, Joern Pezoldt, Jillian Rose Love, Wouter Karthaus, Bart Deplancke, and Maria Brbic

bioRxiv, 2026

@article {Wen2026.01.22.700673,
	author = {Wen, Shuo and Vi{\~n}as Torn{\'e}, Ramon and Bues, Johannes and Lambert, Camille Lucie and Grenningloh, Nadia and Ferrari, Timoth{\'e}e and Bugani, Elisa and Pezoldt, Joern and Love, Jillian Rose and Karthaus, Wouter and Deplancke, Bart and Brbi{\'c}, Maria},
	title = {Generative modeling reveals the connection between cellular morphology and gene expression},
	year = {2026},
	URL = {https://www.biorxiv.org/content/early/2026/01/24/2026.01.22.700673},
	journal = {bioRxiv}
}

Overview of COSMIC

COSMIC captures the relationship between gene expression and cellular morphology by learning to translate from transcriptomes to nuclear images and from nuclear images to transcriptomes. It builds on two conditional diffusion models trained on paired nuclear images and gene expression profiles. In each direction, COSMIC first encodes the input modality into a feature representation, then conditions the diffusion model on these features to generate the corresponding transcriptome or nuclear image.

To build a general-purpose encoder of nuclear morphology, we trained a vision foundation model (FM) on microscopy images of cellular nuclei, referred to as the morphology FM. The model was trained on 21,784,309 nuclear images by segmenting and isolating individual nuclear crops from 50,377 whole-well Hoechst-stained microscopy images from different studies.

COSMIC generates realistic nuclear images from gene expression

We apply COSMIC on a dataset generated by IRIS. Specifically, we trained COSMIC on 4,520 paired nuclear image-transcriptome samples of mouse cells, and evaluated it on an independent set of 4,519 samples. For the mouse cells, we profiled embryonic fibroblasts (3T3), macrophage-like cells (RAW), CAR-engineered T cells derived from the A20 B-cell lymphoma model (CAR_A20), and primary naive CD8+ T cells (naive CD8).

COSMIC accurately synthesizes high-resolution nuclear images conditioned on single-cell transcriptomic profiles. Generated images closely match real nuclei in size, shape, and texture, and preserve cell-type-specific morphology. In the embedding space, real and generated nuclei overlap strongly. (BCL stands for the B-cell lymphoma in the following figure.)

Images generated by COSMIC retain sufficient biological signal to support accurate cell type classification, approaching the performance achieved on real microscopy images. Additionally, COSMIC generalizes to unseen batches, new donors, and even across species. (Experiment details are shown in the COSMIC manuscript.)

COSMIC predicts transcriptomic profiles from nuclear morphology

We next examine the performance of COSMIC in the reverse direction: predicting transcriptomic profiles from microscopy images of single cells. The dataset and data split are the same as in the previous section. In this reverse direction, COSMIC infers biologically meaningful transcriptomic profiles directly from nuclear images. Transcriptomes predicted by COSMIC recover the global structure of gene expression space and maintain clear separation between cell types.

At the gene level, COSMIC connects genes to nuclear morphology by identifying a subset of genes whose expression can be robustly predicted from morphology, including cell-type marker genes with high correlation to ground truth. These results demonstrate that nuclear morphology encodes precise information about specific transcriptional programs.

COSMIC captures continuous cell-cycle dynamics

Beyond discrete cell types, COSMIC learns continuous biological processes. In mouse fibroblasts, COSMIC accurately recovers cell-cycle dynamics from both modalities. Predicted transcriptomes reproduce phase-specific expression patterns of canonical cell-cycle genes, while generated nuclear images reflect expected morphological changes across the cell cycle, such as systematic variation in nuclear size.

COSMIC identifies morphology-associated genes in cancer

We apply COSMIC to prostate cancer cells treated with a chemotherapy drug (Docetaxel). COSMIC identified morphological and transcriptomic differences between chemotherapy drug treatment-responsive and -resistant cells, and revealed morphology-associated genes linked to tumor state.

Code

A PyTorch implementation of COSMIC is available on GitHub.

Contributors

The following people contributed to this work:

Shuo Wen

Ramon Vinas Torne

Johannes Bues

Camille Lucie Lambert

Nadia Grenningloh

Timothee Ferrari

Elisa Bugani

Joern Pezoldt

Jillian Rose Love

Wouter Karthaus

Bart Deplancke

Maria Brbić